Background, what’s out there (visualization tools,) why this is useful (because there are not that many detailed examples showing the code, talk about your experience in Sunbelt “what’s the format of the data”, look for papers talking about computing literacy) and our goal (start to finish network visualization: load the data, process it a little bit, and plot it).
Talk about the different aspects about network viz the user needs to consider: layout, vertex size, vertex colour, vertex shape, edges, edges width, etc. Talk about the different components and how can we use them (to represent what, for example.) The size of the network, and type of the network (egocentric, small, large, bipartite, etc.)
In terms of the layouts, what are the things we need to consider (we can mention R packages that implement layouts in R).
Network visualization has many aspects that need to be taken into consideration to make the visualization effective in getting the necessary information across. One aspect that to consider is the layout of the data.
Vertex Size: Another aspect to consider is the size of the vertices.
Vertex color: The color of the vertex is also an important consideration. Color can make make objects look more visually appealing, but it can also be useful in differentiation in objects or levels of an object (Ognyanova, 2019). It can also assist in visualizing grouping and patterns or cluster detection (Tyner, 2017)
Vertex shape:
Edges:
Edge width:
For the first example, we will use a data set from the paper titled “Estimates of Social Contact in a Middle School Based on Self-Report and Wireless Sensor Data” by Leecaster et al. The data set we are using explores 7th and 8th grade students, and we will focus on the connections between those students. We have identifiers such as gender, lunch period, and grade that we can work with.
We will use this data set to explore what netplot can do in terms of data visualization.
First, the data needs to be pulled in. After we pull it in, let’s get a glimpse of what the data looks like.
# attaching packages
library(igraph)
library(data.table)
library(devtools)
install_github("USCCANA/netplot")
library(netplot)
# loading and cleaning data
students <- fread("./data/middle_school/pone.0153690.s001.csv")
interactions <- fread("./data/middle_school/pone.0153690.s003.csv")
print(students)
## id grade gender unique lunch initialsNum
## 1: 2003 7 0 0 1 386
## 2: 2004 8 1 1 1 402
## 3: 2006 7 1 1 2 288
## 4: 2008 8 0 1 1 199
## 5: 2009 7 1 0 1 147
## ---
## 674: NA 8 0 0 99 171
## 675: NA 8 0 1 99 270
## 676: NA 8 0 1 99 327
## 677: NA 99 1 0 99 378
## 678: NA 7 1 0 99 277
print(interactions)
## id contactGender contactGrade contactId ClassPeriod contactInitialNum
## 1: 2004 1 8 3127 4 323
## 2: 2004 0 8 2620 1 335
## 3: 2004 1 8 99 1 401
## 4: 2004 1 8 99 9 401
## 5: 2004 1 8 99 9 401
## ---
## 10777: 3448 1 7 99 4 79
## 10778: 3448 1 7 99 2 17
## 10779: 3448 1 7 99 4 17
## 10780: 3448 1 7 3439 3 155
## 10781: 3448 1 7 99 3 294
In order to use the data, we need to remove all of the ’N/A’s and miscoding in the datasets. Also, we see a large number of students who only have interactions with themselves (they do not interact with anyone else through the day), so these “isolates” need to be removed in order for the graph to be more easily read.
# filtering out 'N/A's in the 'students' data frame
students <- students[!is.na(id)]
# filtering down to gender being "0" or "1"
students <- students[gender %in% c("0", "1")]
# filter out 'N/A's in 'id' and 'contactId'
interactions <- interactions[!is.na(id) & !is.na(contactId)]
# Which connections are not OK?
ids <- sort(unique(students$id))
# narrowed our data from 10781 to 5150
interactions <- interactions[(id %in% ids) & (contactId %in% ids)]
source(file = "./misc/color_nodes_function.R")
After, the two datasets need to be combined together.
## Creating matrix from datasets
net <- graph_from_data_frame(
d = interactions[, .(id, contactId)],
directed = FALSE, vertices = as.data.frame(students)
)
## Getting only connected individuals
net_with_no_isolates <- induced_subgraph(net, which(degree(net) > 0))
Finally, we plot it, effectively showing this network graph.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates
)
Here, we are taking the data set and the plot, letting us customize a number of aspects of the graph. First, in order to work with the “color_nodes” function, we need to make “grade” a factor instead of being numeric. Also, we identify the colors we would like the nodes to be.
## adjust 'grade' to factor
V(net_with_no_isolates)$grade <- as.factor(V(net_with_no_isolates)$grade)
# plotting connections among grades ####
set.seed(3)
a_colors <- color_nodes(net_with_no_isolates,"grade", c("gray40","red3"))
attr(a_colors, "map")
## 7 8
## "#666666" "#CD0000"
Now, we are able to create a plot of the data. This is the same data that we used to create the plot above, but now adjustments to the nodes will be made.
Color the vertices (‘vertex.color’) according to the grade the student is in (with 7th graders being gray and 8th graders being red).
Adjust the shape of the vertices (‘vertex.nsides’). If the student is a 7th grader, the vertices will be a circle, but if they are not, the vertices will be a triangle.
Adjust size of vertices (‘vertex.size.range’).
Remove the labels of the nodes.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.color = color_nodes(net_with_no_isolates, "grade", c("gray40","red3")),
vertex.nsides = ifelse(V(net_with_no_isolates)$grade == 7, 10, 3),
vertex.size.range = c(0.015, 0.020),
vertex.label = NULL)
print(grades)
This looks good, but lets alternate these parameters we just gave to make things have a different look.
Change vertex.colors to be tied to a color palette.
Adjust vertex.nsides to make 7th graders be an octagon and 8th graders be a hexagon.
Adjust vertex.size.range, making each vertex smaller.
Add and adjust labels of vertices with functions vertex.label.[specific_function]
vertex.label.fontsize adjust the font size
vertex.label.show adjusts proportion of labels to keep.
Adjust vertex.frame.color to give an outline of each vertex.
library(igraph)
library(RColorBrewer)
# Create a color palette using RColorBrewer
palette <- brewer.pal(3, "Set1") # Change the number and palette name as needed
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.color = color_nodes(net_with_no_isolates, "grade", palette),
vertex.nsides = ifelse(V(net_with_no_isolates)$grade == 7, 8, 6),
vertex.size.range = c(0.01, 0.011),
vertex.label.fontsize = 10,
vertex.label.show = .25,
vertex.frame.color = "black")
print(grades)
Now that we have explored a bit about vertices, let’s dive into options related to edges.
Change edge.width.range to make the size of the edges wider or thinner.
Change edge.color to blue.
Change edge.color.alpha to adjust transparency.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.label=NULL,
edge.width.range = c(.25,1),
edge.color = "dodgerblue4",
edge.color.alpha = .33)
print(grades)
Now, let’s adjust everything again, showing some of the things that netplot can do with edges.
Adjust edge.color so that edges correspond to vertices on a gradient.
Adjust edge.curvature to make edges a straight line.
Adjust edge.line.lty to make edges long dashes.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.label=NULL,
edge.width.range = c(1,1),
vertex.color = color_nodes(net_with_no_isolates, "grade", c("blue","red3")),
edge.color = ~ego(alpha = 0.5) + alter(alpha = 0.5),
edge.curvature = 0,
edge.line.lty = 5)
print(grades)
Using the same plot that we originally created, we can also adjust some of the aspects outside of vertices and edges.
Adjust bg.col to make background color slate gray.
Adjust sample.edges to select a proportion of the edges.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates,
vertex.label=NULL,
bg.col = "slategray1",
sample.edges = .5)
We can adjust things to get a different outcome.
Adjust skip.edges to remove edges altogether.
Adjust bg.col to misty rose.
Adjust zero.margins to true.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates,
vertex.label=NULL,
skip.edges = TRUE,
bg.col = "mistyrose",
zero.margins = TRUE
)
The middle school data set provides a basis where we can see what netplot can do. There are options to adjust the vertices, edges, and even other parameters.
This data set comes from “Assessing Pathogen Transmission Opportunities: Variation in Nursing Home Staff-Resident Interactions” by Chang et. al. It explores connections in a number of nursing homes across 7 states between patients and healthcare providers. There are 99 networks in the data set.
With this data, we will explore how multiple smaller networks can work together to tell a story and can be plotted using netplot.
First, the data needs to be loaded in, with the requisite packages we will be using.
# attaching packages
library(network)
library(devtools)
install_github("USCCANA/netplot")
library(netplot)
data <- load("./data/nursing_home/network99_f1.RData")
Following, we are now ready to plot the data, as it is already in the correct, cleaned format. First, let’s pull the first and the second networks alone so we can have a closer look at them.
# Creates an empty list to store the networks
nets <- list()
# Sets a seed for reproducibility
set.seed(1231)
for (i in 1:2) { # Change the loop range to 1:2
# Checks if the vertex "is_actor" exists in the network
is_health_care_provider <- networks[[i]] %v% "is_actor"
nets[[i]] <- nplot(
networks[[i]],
# Colors the vertices gray if HCP exists, red otherwise
vertex.color = ifelse(is_health_care_provider, "gray40", "red3"),
# Makes vertices square if HCP exists, round otherwise
vertex.nsides = ifelse(is_health_care_provider == TRUE, 4, 10),
# Makes HCP vertices larger than patient vertices
vertex.size = ifelse(is_health_care_provider == TRUE, .25, .15),
vertex.size.range = c(.015, .065),
edge.width.range = c(.25, .5),
# Sets edge line breaks to 1 and colors edges black
edge.line.breaks = 1,
edge.color = ~ego(alpha = 1, col = "lightgray") + alter(alpha = 1, col = "lightgray"),
edge.curvature = pi / 6,
# Removes vertex labels
vertex.label = NULL
)
}
# Combines the 2 plots into a 1x2 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow = 1, ncol = 2)
Here, the healthcare provider is represented by gray diamonds, while the patients are represented by red circles.
Much like the previous example, we can use the different aspects of netplot to adjust how the graph looks.
Adjust vertex.color so providers are purple instead of gray and patients are pink instead of red.
Adjust vertex.nsides so providers are triangles and patients are hexagons.
Adjust edge.line.breaksto make the edges curved instead of straight.
Adjust edge.color so edges are now black instead of gray.
alpha so the black is slightly transparent.Adjust edge.curvature to make the edges more curved.
# Creates an empty list to store the networks
nets <- list()
# Sets a seed for reproducibility
set.seed(1231)
for (i in 1:2) { # Change the loop range to 1:2
# Checks if the vertex "is_actor" exists in the network
is_health_care_provider <- networks[[i]] %v% "is_actor"
nets[[i]] <- nplot(
networks[[i]],
# Colors the vertices gray if HCP exists, red otherwise
vertex.color = ifelse(is_health_care_provider, "purple", "pink"),
# Makes vertices square if HCP exists, round otherwise
vertex.nsides = ifelse(is_health_care_provider == TRUE, 3, 6),
# Makes HCP vertices larger than patient vertices
vertex.size = ifelse(is_health_care_provider == TRUE, .25, .15),
vertex.size.range = c(.015, .065),
edge.width.range = c(.25, .5),
# Sets edge line breaks to 1 and colors edges black
edge.line.breaks = 6,
edge.color = ~ego(alpha = .8, col = "black") + alter(alpha = .8, col = "black"),
edge.curvature = pi / 3,
# Removes vertex labels
vertex.label = NULL
)
}
# Combines the 2 plots into a 1x2 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow = 1, ncol = 2)
Now that we understand what these networks look like at a closer level, we can plot them all for comparison.
# Creates an empty list to store the networks
nets <- list()
# Sets a seed for reproducibility
set.seed(1231)
for (i in 1:99) {
# Checks if the vertex "is_actor" exists in the network
is_health_care_provider <- networks[[i]] %v% "is_actor"
nets[[i]] <- nplot( networks[[i]],
# Colors the vertices gray if HCP exists, red otherwise
vertex.color = ifelse(is_health_care_provider, "gray40", "red3"),
# Makes vertices square if HCP exists, round otherwise
vertex.nsides = ifelse(is_health_care_provider == TRUE, 4, 10),
# Makes HCP vertices larger than patient vertices
vertex.size = ifelse(is_health_care_provider == TRUE, .25, .15),
vertex.size.range = c(.015,.065),
edge.width.range = c(.25,.5),
# Sets edge line breaks to 1 and colors edges black
edge.line.breaks = 1, edge.color = ~ ego(alpha = 1, col = "lightgray") + alter(alpha = 1, col = "lightgray"),
edge.curvature = pi/6,
# Removes vertex labels
vertex.label = NULL )
}
# Combines the 99 plots into an 11x9 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow=11, ncol=9)